Overview
Brought to you by YData
Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 432 | 430 |
| Missing cells (%) | 8.1% | 8.0% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Sex is highly overall correlated with Survived | Sex is highly overall correlated with Survived | High correlation |
Survived is highly overall correlated with Sex | Survived is highly overall correlated with Sex | High correlation |
Age has 92 (20.6%) missing values | Age has 85 (19.1%) missing values | Missing |
Cabin has 339 (76.0%) missing values | Cabin has 345 (77.4%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 295 (66.1%) zeros | SibSp has 305 (68.4%) zeros | Zeros |
Parch has 348 (78.0%) zeros | Parch has 338 (75.8%) zeros | Zeros |
Fare has 7 (1.6%) zeros | Alert not present in this dataset | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2025-03-21 10:46:16.022118 | 2025-03-21 10:46:18.270670 |
| Analysis finished | 2025-03-21 10:46:18.267540 | 2025-03-21 10:46:20.503093 |
| Duration | 2.25 seconds | 2.23 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
Variables
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 441.31614 | 458.9574 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 3 |
| Maximum | 890 | 891 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 3 |
| 5-th percentile | 51.25 | 55.25 |
| Q1 | 225.5 | 248 |
| median | 441.5 | 468 |
| Q3 | 646.75 | 664.75 |
| 95-th percentile | 848.75 | 847.5 |
| Maximum | 890 | 891 |
| Range | 889 | 888 |
| Interquartile range (IQR) | 421.25 | 416.75 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 252.61733 | 254.09038 |
| Coefficient of variation (CV) | 0.57241806 | 0.55362519 |
| Kurtosis | -1.1311608 | -1.1602973 |
| Mean | 441.31614 | 458.9574 |
| Median Absolute Deviation (MAD) | 213 | 209.5 |
| Skewness | 0.033611435 | -0.075809285 |
| Sum | 196827 | 204695 |
| Variance | 63815.516 | 64561.92 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 371 | 1 | 0.2% |
| 629 | 1 | 0.2% |
| 587 | 1 | 0.2% |
| 708 | 1 | 0.2% |
| 181 | 1 | 0.2% |
| 73 | 1 | 0.2% |
| 352 | 1 | 0.2% |
| 520 | 1 | 0.2% |
| 164 | 1 | 0.2% |
| 766 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 461 | 1 | 0.2% |
| 420 | 1 | 0.2% |
| 687 | 1 | 0.2% |
| 885 | 1 | 0.2% |
| 320 | 1 | 0.2% |
| 354 | 1 | 0.2% |
| 238 | 1 | 0.2% |
| 427 | 1 | 0.2% |
| 163 | 1 | 0.2% |
| 402 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 9 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 17 | 1 | |
| 18 | 1 | |
| 19 | 1 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 5 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 16 | 1 | |
| 19 | 1 | |
| 21 | 1 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 5 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 16 | 1 | |
| 19 | 1 | |
| 21 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 9 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 17 | 1 | |
| 18 | 1 | |
| 19 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 0 | 0 |
| 2nd row | 0 | 0 |
| 3rd row | 1 | 0 |
| 4th row | 0 | 1 |
| 5th row | 0 | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 271 | |
| 1 | 175 |
| Value | Count | Frequency (%) |
| 0 | 259 | |
| 1 | 187 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 271 | |
| 1 | 175 |
| Value | Count | Frequency (%) |
| 0 | 259 | |
| 1 | 187 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 271 | |
| 1 | 175 |
| Value | Count | Frequency (%) |
| 0 | 259 | |
| 1 | 187 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 271 | |
| 1 | 175 |
| Value | Count | Frequency (%) |
| 0 | 259 | |
| 1 | 187 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 271 | |
| 1 | 175 |
| Value | Count | Frequency (%) |
| 0 | 259 | |
| 1 | 187 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 271 | |
| 1 | 175 |
| Value | Count | Frequency (%) |
| 0 | 259 | |
| 1 | 187 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 3 | 3 |
| 2nd row | 2 | 3 |
| 3rd row | 1 | 3 |
| 4th row | 3 | 1 |
| 5th row | 2 | 3 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 110 | |
| 2 | 96 | 21.5% |
| Value | Count | Frequency (%) |
| 3 | 249 | |
| 1 | 117 | |
| 2 | 80 | 17.9% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 110 | |
| 2 | 96 | 21.5% |
| Value | Count | Frequency (%) |
| 3 | 249 | |
| 1 | 117 | |
| 2 | 80 | 17.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 110 | |
| 2 | 96 | 21.5% |
| Value | Count | Frequency (%) |
| 3 | 249 | |
| 1 | 117 | |
| 2 | 80 | 17.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 110 | |
| 2 | 96 | 21.5% |
| Value | Count | Frequency (%) |
| 3 | 249 | |
| 1 | 117 | |
| 2 | 80 | 17.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 110 | |
| 2 | 96 | 21.5% |
| Value | Count | Frequency (%) |
| 3 | 249 | |
| 1 | 117 | |
| 2 | 80 | 17.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 240 | |
| 1 | 110 | |
| 2 | 96 | 21.5% |
| Value | Count | Frequency (%) |
| 3 | 249 | |
| 1 | 117 | |
| 2 | 80 | 17.9% |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 82 |
| Median length | 50 | 49 |
| Mean length | 27.076233 | 27.067265 |
| Min length | 12 | 12 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Bostandyeff, Mr. Guentcho | Van Impe, Miss. Catharina |
| 2nd row | Jarvis, Mr. John Denzil | Panula, Mr. Jaako Arnold |
| 3rd row | Calderhead, Mr. Edward Pennington | Sutehall, Mr. Henry Jr |
| 4th row | Sage, Miss. Constance Gladys | Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone) |
| 5th row | Hood, Mr. Ambrose Jr | Arnold-Franchi, Mr. Josef |
| Value | Count | Frequency (%) |
| mr | 255 | 13.9% |
| miss | 89 | 4.9% |
| mrs | 70 | 3.8% |
| william | 31 | 1.7% |
| john | 27 | 1.5% |
| master | 22 | 1.2% |
| henry | 20 | 1.1% |
| thomas | 15 | 0.8% |
| james | 11 | 0.6% |
| mary | 11 | 0.6% |
| Other values (895) | 1282 |
| Value | Count | Frequency (%) |
| mr | 253 | 13.9% |
| miss | 90 | 4.9% |
| mrs | 78 | 4.3% |
| william | 25 | 1.4% |
| john | 23 | 1.3% |
| master | 15 | 0.8% |
| charles | 14 | 0.8% |
| henry | 14 | 0.8% |
| thomas | 13 | 0.7% |
| george | 12 | 0.7% |
| Other values (906) | 1288 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1388 | 11.5% | |
| r | 969 | 8.0% |
| e | 865 | 7.2% |
| a | 832 | 6.9% |
| s | 656 | 5.4% |
| i | 648 | 5.4% |
| n | 633 | 5.2% |
| M | 575 | 4.8% |
| l | 528 | 4.4% |
| o | 524 | 4.3% |
| Other values (50) | 4458 |
| Value | Count | Frequency (%) |
| 1381 | 11.4% | |
| r | 972 | 8.1% |
| e | 850 | 7.0% |
| a | 847 | 7.0% |
| s | 663 | 5.5% |
| i | 661 | 5.5% |
| n | 656 | 5.4% |
| M | 559 | 4.6% |
| l | 559 | 4.6% |
| o | 504 | 4.2% |
| Other values (50) | 4420 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 12076 |
| Value | Count | Frequency (%) |
| (unknown) | 12072 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1388 | 11.5% | |
| r | 969 | 8.0% |
| e | 865 | 7.2% |
| a | 832 | 6.9% |
| s | 656 | 5.4% |
| i | 648 | 5.4% |
| n | 633 | 5.2% |
| M | 575 | 4.8% |
| l | 528 | 4.4% |
| o | 524 | 4.3% |
| Other values (50) | 4458 |
| Value | Count | Frequency (%) |
| 1381 | 11.4% | |
| r | 972 | 8.1% |
| e | 850 | 7.0% |
| a | 847 | 7.0% |
| s | 663 | 5.5% |
| i | 661 | 5.5% |
| n | 656 | 5.4% |
| M | 559 | 4.6% |
| l | 559 | 4.6% |
| o | 504 | 4.2% |
| Other values (50) | 4420 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 12076 |
| Value | Count | Frequency (%) |
| (unknown) | 12072 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1388 | 11.5% | |
| r | 969 | 8.0% |
| e | 865 | 7.2% |
| a | 832 | 6.9% |
| s | 656 | 5.4% |
| i | 648 | 5.4% |
| n | 633 | 5.2% |
| M | 575 | 4.8% |
| l | 528 | 4.4% |
| o | 524 | 4.3% |
| Other values (50) | 4458 |
| Value | Count | Frequency (%) |
| 1381 | 11.4% | |
| r | 972 | 8.1% |
| e | 850 | 7.0% |
| a | 847 | 7.0% |
| s | 663 | 5.5% |
| i | 661 | 5.5% |
| n | 656 | 5.4% |
| M | 559 | 4.6% |
| l | 559 | 4.6% |
| o | 504 | 4.2% |
| Other values (50) | 4420 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 12076 |
| Value | Count | Frequency (%) |
| (unknown) | 12072 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1388 | 11.5% | |
| r | 969 | 8.0% |
| e | 865 | 7.2% |
| a | 832 | 6.9% |
| s | 656 | 5.4% |
| i | 648 | 5.4% |
| n | 633 | 5.2% |
| M | 575 | 4.8% |
| l | 528 | 4.4% |
| o | 524 | 4.3% |
| Other values (50) | 4458 |
| Value | Count | Frequency (%) |
| 1381 | 11.4% | |
| r | 972 | 8.1% |
| e | 850 | 7.0% |
| a | 847 | 7.0% |
| s | 663 | 5.5% |
| i | 661 | 5.5% |
| n | 656 | 5.4% |
| M | 559 | 4.6% |
| l | 559 | 4.6% |
| o | 504 | 4.2% |
| Other values (50) | 4420 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.7219731 | 4.7533632 |
| Min length | 4 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | male | female |
| 2nd row | male | male |
| 3rd row | male | male |
| 4th row | female | female |
| 5th row | male | male |
Common Values
| Value | Count | Frequency (%) |
| male | 285 | |
| female | 161 |
| Value | Count | Frequency (%) |
| male | 278 | |
| female | 168 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 285 | |
| female | 161 |
| Value | Count | Frequency (%) |
| male | 278 | |
| female | 168 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 614 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 168 | 7.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2106 |
| Value | Count | Frequency (%) |
| (unknown) | 2120 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 614 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 168 | 7.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2106 |
| Value | Count | Frequency (%) |
| (unknown) | 2120 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 614 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 168 | 7.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2106 |
| Value | Count | Frequency (%) |
| (unknown) | 2120 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 607 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 161 | 7.6% |
| Value | Count | Frequency (%) |
| e | 614 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 168 | 7.9% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 72 | 68 |
| Distinct (%) | 20.3% | 18.8% |
| Missing | 92 | 85 |
| Missing (%) | 20.6% | 19.1% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.978107 | 29.676122 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| Maximum | 71 | 71 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.42 |
| 5-th percentile | 4 | 4 |
| Q1 | 21 | 21 |
| median | 30 | 28 |
| Q3 | 38 | 38 |
| 95-th percentile | 54.525 | 57 |
| Maximum | 71 | 71 |
| Range | 70.58 | 70.58 |
| Interquartile range (IQR) | 17 | 17 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.44238 | 14.50826 |
| Coefficient of variation (CV) | 0.48176425 | 0.48888666 |
| Kurtosis | -0.038763589 | 0.077829294 |
| Mean | 29.978107 | 29.676122 |
| Median Absolute Deviation (MAD) | 9 | 8 |
| Skewness | 0.20709079 | 0.31967258 |
| Sum | 10612.25 | 10713.08 |
| Variance | 208.58235 | 210.48961 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 19 | 16 | 3.6% |
| 22 | 15 | 3.4% |
| 30 | 14 | 3.1% |
| 36 | 13 | 2.9% |
| 28 | 13 | 2.9% |
| 17 | 11 | 2.5% |
| 27 | 11 | 2.5% |
| 33 | 11 | 2.5% |
| 31 | 11 | 2.5% |
| 24 | 11 | 2.5% |
| Other values (62) | 228 | |
| (Missing) | 92 |
| Value | Count | Frequency (%) |
| 21 | 17 | 3.8% |
| 24 | 17 | 3.8% |
| 30 | 14 | 3.1% |
| 35 | 14 | 3.1% |
| 25 | 13 | 2.9% |
| 26 | 12 | 2.7% |
| 36 | 11 | 2.5% |
| 19 | 11 | 2.5% |
| 18 | 11 | 2.5% |
| 22 | 11 | 2.5% |
| Other values (58) | 230 | |
| (Missing) | 85 | 19.1% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 6 | |
| 3 | 4 | |
| 4 | 5 | |
| 5 | 2 | 0.4% |
| 6 | 2 | 0.4% |
| 7 | 3 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 2 | 0.4% |
| 1 | 5 | |
| 2 | 6 | |
| 3 | 2 | 0.4% |
| 4 | 5 | |
| 5 | 1 | 0.2% |
| 7 | 2 | 0.4% |
| 8 | 3 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 2 | 0.4% |
| 1 | 5 | |
| 2 | 6 | |
| 3 | 2 | 0.4% |
| 4 | 5 | |
| 5 | 1 | 0.2% |
| 7 | 2 | 0.4% |
| 8 | 3 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 1 | 0.2% |
| 1 | 3 | |
| 2 | 6 | |
| 3 | 4 | |
| 4 | 5 | |
| 5 | 2 | 0.4% |
| 6 | 2 | 0.4% |
| 7 | 3 |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.52466368 | 0.48206278 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 295 | 305 |
| Zeros (%) | 66.1% | 68.4% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 2 | 2 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.0224763 | 0.98910526 |
| Coefficient of variation (CV) | 1.9488223 | 2.0518184 |
| Kurtosis | 15.755013 | 18.600935 |
| Mean | 0.52466368 | 0.48206278 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.3730613 | 3.6610549 |
| Sum | 234 | 215 |
| Variance | 1.0454578 | 0.97832922 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 295 | |
| 1 | 115 | 25.8% |
| 2 | 14 | 3.1% |
| 4 | 11 | 2.5% |
| 3 | 7 | 1.6% |
| 8 | 2 | 0.4% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 305 | |
| 1 | 109 | 24.4% |
| 2 | 13 | 2.9% |
| 4 | 7 | 1.6% |
| 3 | 7 | 1.6% |
| 5 | 3 | 0.7% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 295 | |
| 1 | 115 | 25.8% |
| 2 | 14 | 3.1% |
| 3 | 7 | 1.6% |
| 4 | 11 | 2.5% |
| 5 | 2 | 0.4% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 305 | |
| 1 | 109 | 24.4% |
| 2 | 13 | 2.9% |
| 3 | 7 | 1.6% |
| 4 | 7 | 1.6% |
| 5 | 3 | 0.7% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 305 | |
| 1 | 109 | 24.4% |
| 2 | 13 | 2.9% |
| 3 | 7 | 1.6% |
| 4 | 7 | 1.6% |
| 5 | 3 | 0.7% |
| 8 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 295 | |
| 1 | 115 | 25.8% |
| 2 | 14 | 3.1% |
| 3 | 7 | 1.6% |
| 4 | 11 | 2.5% |
| 5 | 2 | 0.4% |
| 8 | 2 | 0.4% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 6 | 6 |
| Distinct (%) | 1.3% | 1.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.31838565 | 0.39910314 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 5 | 5 |
| Zeros | 348 | 338 |
| Zeros (%) | 78.0% | 75.8% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 5 | 5 |
| Range | 5 | 5 |
| Interquartile range (IQR) | 0 | 0 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.67838082 | 0.8519108 |
| Coefficient of variation (CV) | 2.130689 | 2.134563 |
| Kurtosis | 7.8658212 | 9.7245775 |
| Mean | 0.31838565 | 0.39910314 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.5017672 | 2.814778 |
| Sum | 142 | 178 |
| Variance | 0.46020053 | 0.725752 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 0 | 348 | |
| 1 | 60 | 13.5% |
| 2 | 35 | 7.8% |
| 5 | 1 | 0.2% |
| 4 | 1 | 0.2% |
| 3 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 338 | |
| 1 | 59 | 13.2% |
| 2 | 40 | 9.0% |
| 5 | 5 | 1.1% |
| 4 | 2 | 0.4% |
| 3 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 348 | |
| 1 | 60 | 13.5% |
| 2 | 35 | 7.8% |
| 3 | 1 | 0.2% |
| 4 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 338 | |
| 1 | 59 | 13.2% |
| 2 | 40 | 9.0% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 5 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 338 | |
| 1 | 59 | 13.2% |
| 2 | 40 | 9.0% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 5 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 348 | |
| 1 | 60 | 13.5% |
| 2 | 35 | 7.8% |
| 3 | 1 | 0.2% |
| 4 | 1 | 0.2% |
| 5 | 1 | 0.2% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 383 | 376 |
| Distinct (%) | 85.9% | 84.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.67713 | 6.7556054 |
| Min length | 3 | 3 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 333 | 319 ? |
| Unique (%) | 74.7% | 71.5% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 349224 | 345773 |
| 2nd row | 237565 | 3101295 |
| 3rd row | PC 17476 | SOTON/OQ 392076 |
| 4th row | CA. 2343 | 16966 |
| 5th row | S.O.C. 14879 | 349237 |
| Value | Count | Frequency (%) |
| pc | 28 | 5.0% |
| c.a | 15 | 2.7% |
| a/5 | 9 | 1.6% |
| sc/paris | 6 | 1.1% |
| w./c | 5 | 0.9% |
| soton/o.q | 5 | 0.9% |
| c | 5 | 0.9% |
| 2 | 4 | 0.7% |
| ca | 4 | 0.7% |
| a/4 | 4 | 0.7% |
| Other values (401) | 476 |
| Value | Count | Frequency (%) |
| pc | 33 | 5.8% |
| c.a | 17 | 3.0% |
| ca | 6 | 1.1% |
| soton/oq | 6 | 1.1% |
| ston/o | 6 | 1.1% |
| 2 | 6 | 1.1% |
| ston/o2 | 4 | 0.7% |
| a/4 | 4 | 0.7% |
| 1601 | 4 | 0.7% |
| 3101295 | 4 | 0.7% |
| Other values (392) | 476 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 394 | |
| 1 | 345 | |
| 2 | 287 | |
| 7 | 264 | |
| 4 | 229 | |
| 6 | 221 | 7.4% |
| 0 | 206 | 6.9% |
| 5 | 198 | 6.6% |
| 9 | 143 | 4.8% |
| 8 | 127 | 4.3% |
| Other values (25) | 564 |
| Value | Count | Frequency (%) |
| 3 | 378 | |
| 1 | 357 | |
| 2 | 296 | |
| 7 | 248 | |
| 4 | 220 | 7.3% |
| 6 | 218 | 7.2% |
| 5 | 193 | 6.4% |
| 0 | 192 | 6.4% |
| 9 | 165 | 5.5% |
| 8 | 132 | 4.4% |
| Other values (22) | 614 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2978 |
| Value | Count | Frequency (%) |
| (unknown) | 3013 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 394 | |
| 1 | 345 | |
| 2 | 287 | |
| 7 | 264 | |
| 4 | 229 | |
| 6 | 221 | 7.4% |
| 0 | 206 | 6.9% |
| 5 | 198 | 6.6% |
| 9 | 143 | 4.8% |
| 8 | 127 | 4.3% |
| Other values (25) | 564 |
| Value | Count | Frequency (%) |
| 3 | 378 | |
| 1 | 357 | |
| 2 | 296 | |
| 7 | 248 | |
| 4 | 220 | 7.3% |
| 6 | 218 | 7.2% |
| 5 | 193 | 6.4% |
| 0 | 192 | 6.4% |
| 9 | 165 | 5.5% |
| 8 | 132 | 4.4% |
| Other values (22) | 614 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2978 |
| Value | Count | Frequency (%) |
| (unknown) | 3013 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 394 | |
| 1 | 345 | |
| 2 | 287 | |
| 7 | 264 | |
| 4 | 229 | |
| 6 | 221 | 7.4% |
| 0 | 206 | 6.9% |
| 5 | 198 | 6.6% |
| 9 | 143 | 4.8% |
| 8 | 127 | 4.3% |
| Other values (25) | 564 |
| Value | Count | Frequency (%) |
| 3 | 378 | |
| 1 | 357 | |
| 2 | 296 | |
| 7 | 248 | |
| 4 | 220 | 7.3% |
| 6 | 218 | 7.2% |
| 5 | 193 | 6.4% |
| 0 | 192 | 6.4% |
| 9 | 165 | 5.5% |
| 8 | 132 | 4.4% |
| Other values (22) | 614 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2978 |
| Value | Count | Frequency (%) |
| (unknown) | 3013 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 394 | |
| 1 | 345 | |
| 2 | 287 | |
| 7 | 264 | |
| 4 | 229 | |
| 6 | 221 | 7.4% |
| 0 | 206 | 6.9% |
| 5 | 198 | 6.6% |
| 9 | 143 | 4.8% |
| 8 | 127 | 4.3% |
| Other values (25) | 564 |
| Value | Count | Frequency (%) |
| 3 | 378 | |
| 1 | 357 | |
| 2 | 296 | |
| 7 | 248 | |
| 4 | 220 | 7.3% |
| 6 | 218 | 7.2% |
| 5 | 193 | 6.4% |
| 0 | 192 | 6.4% |
| 9 | 165 | 5.5% |
| 8 | 132 | 4.4% |
| Other values (22) | 614 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 184 | 181 |
| Distinct (%) | 41.3% | 40.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 31.027074 | 36.49358 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 7 | 3 |
| Zeros (%) | 1.6% | 0.7% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.225 | 7.225 |
| Q1 | 7.925 | 8.05 |
| median | 13.8604 | 15.5 |
| Q3 | 30.64685 | 31.3875 |
| 95-th percentile | 108.28125 | 134.5 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 22.72185 | 23.3375 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 45.068833 | 60.602504 |
| Coefficient of variation (CV) | 1.4525647 | 1.6606346 |
| Kurtosis | 34.786731 | 27.308604 |
| Mean | 31.027074 | 36.49358 |
| Median Absolute Deviation (MAD) | 6.46875 | 7.7646 |
| Skewness | 4.7028866 | 4.5682468 |
| Sum | 13838.075 | 16276.137 |
| Variance | 2031.1997 | 3672.6634 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 8.05 | 26 | 5.8% |
| 13 | 24 | 5.4% |
| 7.8958 | 23 | 5.2% |
| 7.75 | 17 | 3.8% |
| 26 | 15 | 3.4% |
| 10.5 | 11 | 2.5% |
| 7.25 | 9 | 2.0% |
| 7.925 | 8 | 1.8% |
| 0 | 7 | 1.6% |
| 8.6625 | 7 | 1.6% |
| Other values (174) | 299 |
| Value | Count | Frequency (%) |
| 8.05 | 22 | 4.9% |
| 7.8958 | 21 | 4.7% |
| 13 | 17 | 3.8% |
| 26 | 16 | 3.6% |
| 7.75 | 14 | 3.1% |
| 7.925 | 11 | 2.5% |
| 10.5 | 10 | 2.2% |
| 7.225 | 9 | 2.0% |
| 26.55 | 9 | 2.0% |
| 7.2292 | 8 | 1.8% |
| Other values (171) | 309 |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 3 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 3 | |
| 5 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.75 | 2 | |
| 6.95 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 1 | 0.2% |
| 7.1417 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 3 | |
| 5 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.75 | 2 | |
| 6.95 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 1 | 0.2% |
| 7.1417 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.45 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 3 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 1 | 0.2% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 90 | 85 |
| Distinct (%) | 84.1% | 84.2% |
| Missing | 339 | 345 |
| Missing (%) | 76.0% | 77.4% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.5981308 | 3.7227723 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 76 | 71 ? |
| Unique (%) | 71.0% | 70.3% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | E24 | E34 |
| 2nd row | C128 | G6 |
| 3rd row | D11 | F33 |
| 4th row | C91 | D26 |
| 5th row | C50 | B51 B53 B55 |
| Value | Count | Frequency (%) |
| b96 | 4 | 3.2% |
| b98 | 4 | 3.2% |
| f33 | 3 | 2.4% |
| f | 3 | 2.4% |
| b18 | 2 | 1.6% |
| e44 | 2 | 1.6% |
| e25 | 2 | 1.6% |
| e67 | 2 | 1.6% |
| e8 | 2 | 1.6% |
| d36 | 2 | 1.6% |
| Other values (91) | 98 |
| Value | Count | Frequency (%) |
| c23 | 4 | 3.3% |
| c25 | 4 | 3.3% |
| c27 | 4 | 3.3% |
| d26 | 2 | 1.7% |
| f33 | 2 | 1.7% |
| b51 | 2 | 1.7% |
| b53 | 2 | 1.7% |
| b55 | 2 | 1.7% |
| g6 | 2 | 1.7% |
| e25 | 2 | 1.7% |
| Other values (86) | 95 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 42 | 10.9% |
| B | 39 | 10.1% |
| 3 | 32 | 8.3% |
| C | 31 | 8.1% |
| 2 | 26 | 6.8% |
| 6 | 26 | 6.8% |
| 8 | 24 | 6.2% |
| 0 | 22 | 5.7% |
| 7 | 20 | 5.2% |
| 9 | 18 | 4.7% |
| Other values (8) | 105 |
| Value | Count | Frequency (%) |
| C | 42 | |
| 2 | 42 | |
| B | 33 | 8.8% |
| 3 | 30 | 8.0% |
| 5 | 30 | 8.0% |
| 6 | 26 | 6.9% |
| 1 | 25 | 6.6% |
| 20 | 5.3% | |
| 4 | 19 | 5.1% |
| 7 | 19 | 5.1% |
| Other values (8) | 90 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 385 |
| Value | Count | Frequency (%) |
| (unknown) | 376 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 42 | 10.9% |
| B | 39 | 10.1% |
| 3 | 32 | 8.3% |
| C | 31 | 8.1% |
| 2 | 26 | 6.8% |
| 6 | 26 | 6.8% |
| 8 | 24 | 6.2% |
| 0 | 22 | 5.7% |
| 7 | 20 | 5.2% |
| 9 | 18 | 4.7% |
| Other values (8) | 105 |
| Value | Count | Frequency (%) |
| C | 42 | |
| 2 | 42 | |
| B | 33 | 8.8% |
| 3 | 30 | 8.0% |
| 5 | 30 | 8.0% |
| 6 | 26 | 6.9% |
| 1 | 25 | 6.6% |
| 20 | 5.3% | |
| 4 | 19 | 5.1% |
| 7 | 19 | 5.1% |
| Other values (8) | 90 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 385 |
| Value | Count | Frequency (%) |
| (unknown) | 376 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 42 | 10.9% |
| B | 39 | 10.1% |
| 3 | 32 | 8.3% |
| C | 31 | 8.1% |
| 2 | 26 | 6.8% |
| 6 | 26 | 6.8% |
| 8 | 24 | 6.2% |
| 0 | 22 | 5.7% |
| 7 | 20 | 5.2% |
| 9 | 18 | 4.7% |
| Other values (8) | 105 |
| Value | Count | Frequency (%) |
| C | 42 | |
| 2 | 42 | |
| B | 33 | 8.8% |
| 3 | 30 | 8.0% |
| 5 | 30 | 8.0% |
| 6 | 26 | 6.9% |
| 1 | 25 | 6.6% |
| 20 | 5.3% | |
| 4 | 19 | 5.1% |
| 7 | 19 | 5.1% |
| Other values (8) | 90 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 385 |
| Value | Count | Frequency (%) |
| (unknown) | 376 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 42 | 10.9% |
| B | 39 | 10.1% |
| 3 | 32 | 8.3% |
| C | 31 | 8.1% |
| 2 | 26 | 6.8% |
| 6 | 26 | 6.8% |
| 8 | 24 | 6.2% |
| 0 | 22 | 5.7% |
| 7 | 20 | 5.2% |
| 9 | 18 | 4.7% |
| Other values (8) | 105 |
| Value | Count | Frequency (%) |
| C | 42 | |
| 2 | 42 | |
| B | 33 | 8.8% |
| 3 | 30 | 8.0% |
| 5 | 30 | 8.0% |
| 6 | 26 | 6.9% |
| 1 | 25 | 6.6% |
| 20 | 5.3% | |
| 4 | 19 | 5.1% |
| 7 | 19 | 5.1% |
| Other values (8) | 90 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 1 | 0 |
| Missing (%) | 0.2% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | S |
| 2nd row | S | S |
| 3rd row | S | S |
| 4th row | S | C |
| 5th row | S | S |
Common Values
| Value | Count | Frequency (%) |
| S | 314 | |
| C | 88 | 19.7% |
| Q | 43 | 9.6% |
| (Missing) | 1 | 0.2% |
| Value | Count | Frequency (%) |
| S | 313 | |
| C | 100 | 22.4% |
| Q | 33 | 7.4% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 314 | |
| c | 88 | 19.8% |
| q | 43 | 9.7% |
| Value | Count | Frequency (%) |
| s | 313 | |
| c | 100 | 22.4% |
| q | 33 | 7.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 314 | |
| C | 88 | 19.8% |
| Q | 43 | 9.7% |
| Value | Count | Frequency (%) |
| S | 313 | |
| C | 100 | 22.4% |
| Q | 33 | 7.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| S | 314 | |
| C | 88 | 19.8% |
| Q | 43 | 9.7% |
| Value | Count | Frequency (%) |
| S | 313 | |
| C | 100 | 22.4% |
| Q | 33 | 7.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| S | 314 | |
| C | 88 | 19.8% |
| Q | 43 | 9.7% |
| Value | Count | Frequency (%) |
| S | 313 | |
| C | 100 | 22.4% |
| Q | 33 | 7.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| S | 314 | |
| C | 88 | 19.8% |
| Q | 43 | 9.7% |
| Value | Count | Frequency (%) |
| S | 313 | |
| C | 100 | 22.4% |
| Q | 33 | 7.4% |
Interactions
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Correlations
Dataset A
Dataset B
Dataset A
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.085 | 0.115 | -0.314 | 0.038 | 0.259 | 0.000 | -0.206 | 0.118 |
| Embarked | 0.085 | 1.000 | 0.200 | 0.000 | 0.000 | 0.265 | 0.105 | 0.088 | 0.220 |
| Fare | 0.115 | 0.200 | 1.000 | 0.367 | 0.013 | 0.491 | 0.216 | 0.448 | 0.332 |
| Parch | -0.314 | 0.000 | 0.367 | 1.000 | 0.026 | 0.000 | 0.217 | 0.402 | 0.153 |
| PassengerId | 0.038 | 0.000 | 0.013 | 0.026 | 1.000 | 0.000 | 0.051 | -0.055 | 0.101 |
| Pclass | 0.259 | 0.265 | 0.491 | 0.000 | 0.000 | 1.000 | 0.160 | 0.166 | 0.397 |
| Sex | 0.000 | 0.105 | 0.216 | 0.217 | 0.051 | 0.160 | 1.000 | 0.195 | 0.537 |
| SibSp | -0.206 | 0.088 | 0.448 | 0.402 | -0.055 | 0.166 | 0.195 | 1.000 | 0.195 |
| Survived | 0.118 | 0.220 | 0.332 | 0.153 | 0.101 | 0.397 | 0.537 | 0.195 | 1.000 |
Dataset B
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.000 | 0.155 | -0.208 | 0.054 | 0.239 | 0.098 | -0.181 | 0.192 |
| Embarked | 0.000 | 1.000 | 0.205 | 0.084 | 0.000 | 0.252 | 0.066 | 0.000 | 0.138 |
| Fare | 0.155 | 0.205 | 1.000 | 0.395 | -0.093 | 0.463 | 0.229 | 0.416 | 0.322 |
| Parch | -0.208 | 0.084 | 0.395 | 1.000 | -0.051 | 0.029 | 0.268 | 0.367 | 0.149 |
| PassengerId | 0.054 | 0.000 | -0.093 | -0.051 | 1.000 | 0.000 | 0.051 | -0.073 | 0.103 |
| Pclass | 0.239 | 0.252 | 0.463 | 0.029 | 0.000 | 1.000 | 0.177 | 0.092 | 0.366 |
| Sex | 0.098 | 0.066 | 0.229 | 0.268 | 0.051 | 0.177 | 1.000 | 0.204 | 0.552 |
| SibSp | -0.181 | 0.000 | 0.416 | 0.367 | -0.073 | 0.092 | 0.204 | 1.000 | 0.166 |
| Survived | 0.192 | 0.138 | 0.322 | 0.149 | 0.103 | 0.366 | 0.552 | 0.166 | 1.000 |
Missing values
Dataset A
A simple visualization of nullity by column.
Dataset B
A simple visualization of nullity by column.
Dataset A
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset B
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset A
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Dataset B
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Sample
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 628 | 629 | 0 | 3 | Bostandyeff, Mr. Guentcho | male | 26.0 | 0 | 0 | 349224 | 7.8958 | NaN | S |
| 586 | 587 | 0 | 2 | Jarvis, Mr. John Denzil | male | 47.0 | 0 | 0 | 237565 | 15.0000 | NaN | S |
| 707 | 708 | 1 | 1 | Calderhead, Mr. Edward Pennington | male | 42.0 | 0 | 0 | PC 17476 | 26.2875 | E24 | S |
| 180 | 181 | 0 | 3 | Sage, Miss. Constance Gladys | female | NaN | 8 | 2 | CA. 2343 | 69.5500 | NaN | S |
| 72 | 73 | 0 | 2 | Hood, Mr. Ambrose Jr | male | 21.0 | 0 | 0 | S.O.C. 14879 | 73.5000 | NaN | S |
| 351 | 352 | 0 | 1 | Williams-Lambert, Mr. Fletcher Fellows | male | NaN | 0 | 0 | 113510 | 35.0000 | C128 | S |
| 519 | 520 | 0 | 3 | Pavlovic, Mr. Stefo | male | 32.0 | 0 | 0 | 349242 | 7.8958 | NaN | S |
| 163 | 164 | 0 | 3 | Calic, Mr. Jovo | male | 17.0 | 0 | 0 | 315093 | 8.6625 | NaN | S |
| 765 | 766 | 1 | 1 | Hogeboom, Mrs. John C (Anna Andrews) | female | 51.0 | 1 | 0 | 13502 | 77.9583 | D11 | S |
| 387 | 388 | 1 | 2 | Buss, Miss. Kate | female | 36.0 | 0 | 0 | 27849 | 13.0000 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 419 | 420 | 0 | 3 | Van Impe, Miss. Catharina | female | 10.00 | 0 | 2 | 345773 | 24.1500 | NaN | S |
| 686 | 687 | 0 | 3 | Panula, Mr. Jaako Arnold | male | 14.00 | 4 | 1 | 3101295 | 39.6875 | NaN | S |
| 884 | 885 | 0 | 3 | Sutehall, Mr. Henry Jr | male | 25.00 | 0 | 0 | SOTON/OQ 392076 | 7.0500 | NaN | S |
| 319 | 320 | 1 | 1 | Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone) | female | 40.00 | 1 | 1 | 16966 | 134.5000 | E34 | C |
| 353 | 354 | 0 | 3 | Arnold-Franchi, Mr. Josef | male | 25.00 | 1 | 0 | 349237 | 17.8000 | NaN | S |
| 237 | 238 | 1 | 2 | Collyer, Miss. Marjorie "Lottie" | female | 8.00 | 0 | 2 | C.A. 31921 | 26.2500 | NaN | S |
| 426 | 427 | 1 | 2 | Clarke, Mrs. Charles V (Ada Maria Winfield) | female | 28.00 | 1 | 0 | 2003 | 26.0000 | NaN | S |
| 162 | 163 | 0 | 3 | Bengtsson, Mr. John Viktor | male | 26.00 | 0 | 0 | 347068 | 7.7750 | NaN | S |
| 401 | 402 | 0 | 3 | Adams, Mr. John | male | 26.00 | 0 | 0 | 341826 | 8.0500 | NaN | S |
| 469 | 470 | 1 | 3 | Baclini, Miss. Helene Barbara | female | 0.75 | 2 | 1 | 2666 | 19.2583 | NaN | C |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 354 | 355 | 0 | 3 | Yousif, Mr. Wazli | male | NaN | 0 | 0 | 2647 | 7.2250 | NaN | C |
| 286 | 287 | 1 | 3 | de Mulder, Mr. Theodore | male | 30.0 | 0 | 0 | 345774 | 9.5000 | NaN | S |
| 134 | 135 | 0 | 2 | Sobey, Mr. Samuel James Hayden | male | 25.0 | 0 | 0 | C.A. 29178 | 13.0000 | NaN | S |
| 243 | 244 | 0 | 3 | Maenpaa, Mr. Matti Alexanteri | male | 22.0 | 0 | 0 | STON/O 2. 3101275 | 7.1250 | NaN | S |
| 732 | 733 | 0 | 2 | Knight, Mr. Robert J | male | NaN | 0 | 0 | 239855 | 0.0000 | NaN | S |
| 183 | 184 | 1 | 2 | Becker, Master. Richard F | male | 1.0 | 2 | 1 | 230136 | 39.0000 | F4 | S |
| 521 | 522 | 0 | 3 | Vovk, Mr. Janko | male | 22.0 | 0 | 0 | 349252 | 7.8958 | NaN | S |
| 868 | 869 | 0 | 3 | van Melkebeke, Mr. Philemon | male | NaN | 0 | 0 | 345777 | 9.5000 | NaN | S |
| 844 | 845 | 0 | 3 | Culumovic, Mr. Jeso | male | 17.0 | 0 | 0 | 315090 | 8.6625 | NaN | S |
| 370 | 371 | 1 | 1 | Harder, Mr. George Achilles | male | 25.0 | 1 | 0 | 11765 | 55.4417 | E50 | C |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 629 | 630 | 0 | 3 | O'Connell, Mr. Patrick D | male | NaN | 0 | 0 | 334912 | 7.7333 | NaN | Q |
| 147 | 148 | 0 | 3 | Ford, Miss. Robina Maggie "Ruby" | female | 9.0 | 2 | 2 | W./C. 6608 | 34.3750 | NaN | S |
| 258 | 259 | 1 | 1 | Ward, Miss. Anna | female | 35.0 | 0 | 0 | PC 17755 | 512.3292 | NaN | C |
| 766 | 767 | 0 | 1 | Brewe, Dr. Arthur Jackson | male | NaN | 0 | 0 | 112379 | 39.6000 | NaN | C |
| 660 | 661 | 1 | 1 | Frauenthal, Dr. Henry William | male | 50.0 | 2 | 0 | PC 17611 | 133.6500 | NaN | S |
| 587 | 588 | 1 | 1 | Frolicher-Stehli, Mr. Maxmillian | male | 60.0 | 1 | 1 | 13567 | 79.2000 | B41 | C |
| 694 | 695 | 0 | 1 | Weir, Col. John | male | 60.0 | 0 | 0 | 113800 | 26.5500 | NaN | S |
| 8 | 9 | 1 | 3 | Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | female | 27.0 | 0 | 2 | 347742 | 11.1333 | NaN | S |
| 854 | 855 | 0 | 2 | Carter, Mrs. Ernest Courtenay (Lilian Hughes) | female | 44.0 | 1 | 0 | 244252 | 26.0000 | NaN | S |
| 460 | 461 | 1 | 1 | Anderson, Mr. Harry | male | 48.0 | 0 | 0 | 19952 | 26.5500 | E12 | S |
Duplicate rows
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||